UCell signature enrichment - interacting with Seurat
In this demo, we will apply UCell to evaluate gene signatures in single-cell PBMC data. We will use a subset of the data from Hao and Hao et al, bioRvix 2020, which comprises multiple immune cell types at different levels of resolution. Because these cells were characterized both in terms of transciptomes (using scRNAseq) and surface proteins (using a panel of antibodies), the cell type annotations should be of very high quality. To demonstrate how UCell can simply and accurately evaluate gene signatures on a query dataset, we will apply it directly to the Seurat object from Hao et al. and compare the signature scores to the original cluster annotations by the authors.
Installation
Install UCell
Query single-cell data
Obtain a downsampled version of the data from Hao and Hao et al, bioRvix 2020 at the following link: https://drive.switch.ch/index.php/s/3kM5PQ0tQaG6d6A – 20,000 T cells
Then load the object and visualize the clustering annotation by the authors.
pbmc.Tcell <- readRDS("pbmc_multimodal.downsampled20k.Tcell.seurat.RNA.rds")
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
label.size = 3, repel = TRUE)Score signatures using UCell
Define some signatures for T cell subtypes
markers <- list()
markers$Tcell_CD4 <- c("CD4", "CD40LG")
markers$Tcell_CD8 <- c("CD8A", "CD8B")
markers$Tcell_Treg <- c("FOXP3", "IL2RA")
markers$Tcell_MAIT <- c("KLRB1", "SLC4A10", "NCR3")
markers$Tcell_gd <- c("TRDC", "TRGC1", "TRGC2", "TRDV1")
markers$Tcell_NK <- c("FGFBP2", "SPON2", "KLRF1", "FCGR3A", "KLRD1", "TRDC")pbmc.Tcell <- AddModuleScore_UCell(pbmc.Tcell, features = markers)
signature.names <- paste0(names(markers), "_UCell")
VlnPlot(pbmc.Tcell, features = signature.names, group.by = "celltype.l1")How do signatures compare to original annotations
Idents(pbmc.Tcell) <- "celltype.l2"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label.size = 3,
repel = TRUE, label = T)Compare to AddModuleScore from Seurat
Seurat comes with a method for signature enrichment analysis, AddModuleScore. This method is very fast, but the score is highly dependent on the composition of the dataset. Here we will apply AddModuleScore with a simple CD8 T cell signature to two datasets: a set composed of different T cell types (pbmc.Tcell) and a subset of this dataset only comprising the CD8 T cells (pbmc.Tcell.CD8).
First, generate a subset only comprising CD8 T cells (pbmc.Tcell.CD8)
Idents(pbmc.Tcell) <- "celltype.l1"
pbmc.Tcell.CD8 <- subset(pbmc.Tcell, idents = c("CD8 T"))
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
label = TRUE, label.size = 3, repel = TRUE) + NoLegend()Note that applying the same signature to the complete set or to the CD8 T subset gives very different results. When other cell types are present, the score distribution for CD8 T cells has a median close to 1, but the same CD8 T cell evaluated alone give a zero-centered distribution of scores. It may be undesirable to have a score that changes so dramatically for the same cells depending of the composition of the dataset.
markers.cd8 <- list(Tcell_CD8 = c("CD8A", "CD8B"))
pbmc.Tcell <- AddModuleScore(pbmc.Tcell, features = markers.cd8, name = "Tcell_CD8_Seurat")
a <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_Seurat1")
pbmc.Tcell.CD8 <- AddModuleScore(pbmc.Tcell.CD8, features = markers.cd8, name = "Tcell_CD8_Seurat")
b <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_Seurat1")
a | b Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.6057 0.5149 0.9236 0.8756 1.2673 2.3228
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.65105 -0.44921 -0.03485 -0.09280 0.30758 1.39551
UCell score is based on gene rankings and therefore is not affected by the composition of the query dataset. Note that the score distribution is nearly identical for the same cell population in different datasets (small differences are due to random resolution of rank ties)
pbmc.Tcell <- AddModuleScore_UCell(pbmc.Tcell, features = markers.cd8)
a <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_UCell")
pbmc.Tcell.CD8 <- AddModuleScore_UCell(pbmc.Tcell.CD8, features = markers.cd8)
b <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_UCell")
a | b Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3803 0.5193 0.5294 0.7733 0.9372
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3803 0.5193 0.5294 0.7733 0.9372
We can have a look at the distribution of the scores for all T cells:
Idents(pbmc.Tcell) <- "celltype.l1"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
label.size = 3, repel = TRUE)FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = c("Tcell_CD8_UCell", "Tcell_CD8_Seurat1"),
ncol = 2, order = T)…and on the CD8 T cell subset only:
Idents(pbmc.Tcell.CD8) <- "celltype.l2"
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
label = TRUE, label.size = 3, repel = TRUE) + NoLegend()FeaturePlot(pbmc.Tcell.CD8, reduction = "wnn.umap", features = c("Tcell_CD8_UCell",
"Tcell_CD8_Seurat1"), ncol = 2, order = T)Further reading
For more examples of UCell functionalities see THIS DEMO
The code and the package are available at the UCell GitHub repository